October 26, 2017

About this tutorial

Spatial Analysis begins with an exploration of the data.

First steps

  1. Mapping: to see its location and distribution

  2. Asking questions of, or querying, your data.

While aspatial queries include basic summary statistics of the attribute data, spatial queries explore the spatial aspects of your data.

Types of Spatial Queries

There are two key types of spatial queries

  • spatial measurement queries,
    • e.g. area, length, distance
  • spatial relationship queries,
    • e.g. what locations in A are also in B.

These types are often combined, e.g.

  • What is the area of A that is within B

Learning Goals

  • Identify the key R packages for working with spatial data

  • Introduce spatial data processing and queries in R

  • Introduce terminolgy, joys & challenges

  • Practice these operations in the context of a analysis of SF parks data

  • Provide sample code and references for participants to refer back to.

Geospatial Data Fundamentals

This tutorial assumes you already know the fundamentals of geographic data.

Getting Started

Let's dive into spatial analysis in R.

We will walk through an example workflow and discuss technical specifics and concepts as they arise in the context of this work.

Prepare your R workspace

# Clean environment
rm(list=ls())

# Set the working directory
setwd("~/Documents/Dlab/dlab_workshops/r-geospatial-f2017")

R Spatial Packages

Let's load the R packages we will use

library(sp)
library(rgdal)
library(rgeos)
library(tmap)
library(RColorBrewer)
library(dplyr)
## leaflet, ggplot2, ggmap, maptools, ??
## rgdal: version: 1.2-8, (SVN revision 663)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-4
## rgeos version: 0.3-23, (SVN revision 546)
##  GEOS runtime version: 3.6.1-CAPI-1.10.1 r0 
##  Linking to sp version: 1.2-4 
##  Polygon checking: TRUE

About the Packages

The R packages used in this tutorial provide the following functionality:

sp: Classes and methods for spatial data

rgdal: for importing, exporting and transforming spatial data

rgeos: for spatial operations and queries on geometric objects

tmap: for creating interactive web maps

RColorBrewer: for selecting predefined color palettes.

ggmap/ggplot2 for geocoding locations, mapping and plotting

The SP Package

The SP package is most commonly used to provide support for vector data objects in R.

Other packages that do things with spatial data typically build on these SP objects.

sp Objects

Three types of sp spatial objects that are commonly used in R are summarized below.

Vector Data Type SP Spatial Class SP Spatial Class with Attributes
Points SpatialPoints SpatialPointsDataFrame
Lines SpatialLines SpatialLinesDataFrame
Polygons SpatialPolygons SpatialPolygonsDataFrame


The Workshop Data

In this tutorial we will explore data about parks, neighborhoods and census tracts in San Francisco.

These data are all from the San Francisco OpenData Portal data.sfgov.org

We will start by loading the parks data, which we have subsetted this data to only include small neighborhood parks.

SF Parks

Spatial point data are commonly stored in CSV files.

Load the CSV file sf_neighborhood_parks.csv into a data frame.

parks <- read.csv("./data/sf_neighborhood_parks.csv", 
                  stringsAsFactors = FALSE)

Explore the Data

head(parks)
class(parks)
str(parks)
summary(parks)

Questions

  • What type of data are parks?

  • Are parks spatial data?

  • How many columns describe the parks?

  • How many parks are in the data set?

  • What columns include the coordinate data?

Creating a SpatialPointsDataFrame

Use the coordinates function from sp to convert the parks data frame to a SpatialPointsDataFrame.

This function will convert the data frame to a SpatialPointsDataFrame

Make spatial with coordinates

The coordinates function requires as input a vector that identifies the columns in the data frame that contain the x and y coordinates.

These coordinate are typically called lon or longitude or lat for latitude if they are geographic coordinates.

?coordinates

coordinates(parks) <- c("lon","lat") # or ~lon+lat

Explore the SPDF

Take a look at the SpatialPointsDataFrame object and note how it differs from the data frame.

head(parks)
class(parks)
str(parks)
summary(parks)

SpatialPointsDataFrame

You can see from str(parks) that the parks SPDF object is a collection of slots or components. The key ones are:

@data the attribute data describing each location

@coords the coordinates for each location

@bbox the min and max lon(x) and lat(y) coordinates that together define the minimum bounding box around the locations

@proj4string the coordinate reference system defintion as a string

Getting Help

# For more info
?SpatialPointsDataFrame

Explore the SPDF Slots

parks@bbox
head(parks@coords)
head(parks@data)
head(parks$ParkName)

Question

What is the CRS of the parks data?

Creating Maps

Create a simple map of the parks using the R base plot method.

plot(parks)

Interactive Maps with tmap

The tmap package provides powerful yet clear syntax for creating maps in R.

There are two tmap modes

  • tmap_mode('plot') - for static maps (default)

  • tmap_mode('view') - for interactive maps

tmap in Interactive mode

The tmap interactive mode brings desktop GIS like functionality to R.

Note only can you display the data but also

  • click-query the features to see the data attributes
  • overlay multple layers and click them
  • add layers in different coordinate reference systems (CRS)

If the CRSs are defined tmap supports on-the-fly CRS transformation.

Create an interactive map with tmap

tmap_mode("view")

# Start with a quick tmap or qtm
qtm(parks)
## tmap mode set to interactive viewing
## Warning: Currect projection of shape parks unknown. Long-lat (WGS84) is
## assumed.

Recap

  • We have loaded a csv file containing lat & lon coordinates for SF Parks as a data frame
  • Converted the data to a SpatialPointsDataFrame using sp::coordinates
  • We made a simple plot of the park points
  • Plotted the points on a static map using ggmap and on an interactive map using tmap
  • QUESTIONS?

Spatial Queries

Spatial Queries

Our scenario in this tutorial is an exploration of SF neighborhood parks.

Question 1. What neighborhood is each park in?

SF Neighborhoods Data

To answer this question let's load some data that delineate SF neighborhoods.

These data are in an ESRI Shapefile.

This is one of the most, if not the most common spatial vector data file formats.

Recall, an ESRI Shapefile is a set of 3 or more files that need to be kept in the same directory.

rgdal

We will use the rgdal library to load the neighborhoods shapefile.

The rgdal library is the most commonly used R library for importing and exporting spatial data.

For more info on working with rgdal to load different types of spatial data in R see this excellent tutorial by Zev Ross. In fact his blog has a wealth of information about working with spatial data in R.

Load a Shapefile

Use the rgdal command readOGR to load the file sf_nhoods.shp shapefile into a SpatialPolygonsDataFrame object called sf_nhoods.

#library(rgdal)

# read the sf_nhoods.shp file from the current working directory
sf_nhoods <- readOGR(dsn="./data", layer="sf_nhoods", 
                     stringsAsFactors = F)
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "sf_nhoods"
## with 41 features
## It has 1 fields

Explore the Data

class(sf_nhoods)
summary(sf_nhoods)
str(sf_nhoods@data)
head(sf_nhoods@data)

Questions

  • What is the class of the SF Neighborhoods data

  • How many features does it contain?

  • What attributes describe the data?

  • What is the CRS of the data?

Map the neighborhoods

Make an interactive tmap of the neigborhoods and parks.

Does this map answer are question What parks are in each Neighborhood?

tmap of parks and neighborhoods

tm_shape(sf_nhoods) + tm_polygons(col="beige") + tm_shape(parks) + tm_symbols(col="green")
## Warning: Currect projection of shape parks unknown. Long-lat (WGS84) is
## assumed.

What neighborhood is each park in?

Discussion

Adding labels to a static map or popups to an interactive map helps but it doesn't scale up well with more than a handful of features. Moreover, it does not provide any output that you can use for further analysis like a table that links the neighborhoods to the parks. For that we need to join the data from these two different layers. This is called a spatial join.

Spatial Join

A spatial join associates rows of data in one object with rows in another object based on the spatial relationship between the two objects.

A spatial join is based on the comparison of two sets of geometries in the same coordinate space.

This is called a spatial overlay.

Spatial overlay

Spatial overlay operations in R are implemented using the over function in the SP and rGEOS libraries.

Point-polygon overlay use SP::over

SpatialLines objects, or pairs of SpatialPolygons require package rgeos, and use gIntersects.

That's likely more detail than you need right now but the point here is that rgeos is the go-to library for vector geometric processing in R.

over

You can interperet over(x,y) as:

  • for each feature in X give me information about the first feature in Y at the corresponding location.

You can interperet over(x,y, returnList=TRUE) as:

  • for each feature in X give me information about the all features in Y at the corresponding location.

See ?over for details.

Layers & Features

The term feature is used to refer to a single unit of geographic data that includes both the location, or geometry, and the descriptive attributes.

The term layer is used to refer to a geographic data file once it is loaded in the software.

So here goes…

What neighborhood is each park is located in?

parks_w_nhoods <- over(parks, sf_nhoods)

Did it work?

parks_w_nhoods <- over(parks, sf_nhoods)

Coordinate reference systems (CRS) must be the same!

CRSs must be the same

The over function, like almost all spatial analysis functions, requires that both data sets be spatial objects (they are) with the same coordinate reference system (CRS). Let's investigate

# What is the CRS of the parks?
parks@proj4string # or proj4string(parks)
## CRS arguments: NA
# What is the CRS of the sf_nhoods?
sf_nhoods@proj4string
## CRS arguments: +proj=longlat +ellps=WGS84 +no_defs

CRS

We can see from the output that parks does not have a CRS and that the CRS forsf_nhoods is:

  • CRS arguments: +proj=longlat +ellps=WGS84 +no_defs

That CRS string refers to the WGS84 geographic coordinate reference system, which is the most commonly used CRS for latitude and longitude coordinates.

Defining a CRS

Ok, so now we know that the CRS for sf_nhoods is WGS84. We also know that our park SpatialPointsDataFrame was created from longitude & latitude coordinates So, it is safe to assume that these are also WGS84. let's set the CRS of our park points to that of the SF neighborhoods.

Note, we are not doing any transformation on the data. We are just identifying the CRS for the coordinates used by the data. If we define the wrong projection (and we can) we will get errors and erroneous results when we apply spatial operations on the data.

Define the CRS

# Set the CRS for parks to be the same as that for sf_nhoods
proj4string(parks) <- CRS(proj4string(sf_nhoods))

# make sure the CRSs are the same
proj4string(parks) == proj4string(sf_nhoods) 
## [1] TRUE

Now let's try that overlay operation again

# Now try the overlay operation again
# For each feature in parks give me info about the sf_nhoods 
# at the corresponding location
parks_w_nhoods <-over(parks,sf_nhoods)

Questions

What is our output? Does it answer our question?

What type of data object did the over function return?

head(parks_w_nhoods) # take a look at the output
class(parks_w_nhoods)
nrow(parks_w_nhoods)
nrow(parks)

over discussion

Our output parks_w_nhoods is a data frame with the id of each park and the name of the neighborhood in which it is located in the column nhood. So we are close to answering our question. But for the data to be useful we need to link (or join) the name of the parks to the name of the nhoods.

Add over output to input SPDF

# Take a look at the data before we change it
# head(parks@data)

# Now combine the information about neighborhoods with the spatial data 
parks@data <- cbind(parks@data,parks_w_nhoods)  
  ## NOTE - binding to the parks@data not parks!!!

# Review and note the change
head(parks@data)
##                     ParkName                        ParkType Acreage
## 1    29TH/DIAMOND OPEN SPACE Neighborhood Park or Playground    0.82
## 2           ADAM ROGERS PARK Neighborhood Park or Playground    2.74
## 3               ALAMO SQUARE Neighborhood Park or Playground   12.70
## 4 ALICE MARBLE TENNIS COURTS Neighborhood Park or Playground    0.84
## 5                ALLYNE PARK Neighborhood Park or Playground    0.75
## 6                 ALTA PLAZA Neighborhood Park or Playground   11.91
##   ParkID                                                       location
## 1    194       Diamond\nSan Francisco, CA\n(37.74360211, -122.43523589)
## 2     46       Ingalls\nSan Francisco, CA\n(37.73101645, -122.38385466)
## 3    117         Hayes\nSan Francisco, CA\n(37.77634875, -122.43467396)
## 4    151     Greenwich\nSan Francisco, CA\n(37.80142776, -122.42034327)
## 5    131 2609 Gough St\nSan Francisco, CA\n(37.79746066, -122.42759992)
## 6    129       Jackson\nSan Francisco, CA\n(37.79117333, -122.43766978)
##                   nhood
## 1            Noe Valley
## 2 Bayview Hunters Point
## 3          Hayes Valley
## 4          Russian Hill
## 5                Marina
## 6       Pacific Heights

Check in tmap

Map the parks data in tmap interactive mode with a popup showing the park name and the neighborhood

tm_shape(sf_nhoods) + tm_polygons(col="beige") + tm_shape(parks) + 
  tm_symbols(col="green", size=0.04)

Evaluation

Did the over command successfully associate the neighborhoods with the parks?

Recap

We just did what's called a type of spatial overlay query called a Point-in-Polygon query. We asked "in what neighborhood is each park located?". We used a spatial operation to add that information to the parks SpatialPointsDataFrame. This is a very common operation. For example, you may ask "In what census tracts are my address points located?". Once you answer that question with a spatial overlay you can link your address data to census data!

WOW

Data linkage via space!

The Neighborhood perspective

We now know the neighborhood for each park. Now let's think about this question from the neighborhood perspective. First, let's ask the question:

  • What parks are in Noe Valley?

How might we answer that?

Attribute query

We can query by attribute because we made the spatial overlay!

# attribute query because we already made the spatial association
parks[parks$nhood=='Noe Valley',]$ParkName  ## Why doesn't this work?

## what about this syntax?
parks[which(parks$nhood=='Noe Valley'),]$ParkName 

Missing values

One park doesn't have a nhood.

Which one is it?

Use tmap to find out why.

# What park doesn't have a value for nhood?
parks[is.na(parks$nhood=='Noe Valley'),]$ParkName  
## [1] "INDIA BASIN SHORELINE PARK"

Fix it

parks@data[parks$ParkID==48,]$nhood <- "Bayview Hunters Point"

# What parks are in Noe Valley?
parks[parks$nhood=='Noe Valley',]$ParkName  ## Now it works work?
## [1] "29TH/DIAMOND OPEN SPACE"     "DOUGLASS PLAYGROUND"        
## [3] "DUNCAN/CASTRO OPEN SPACE"    "NOE VALLEY COURTS"          
## [5] "UPPER NOE RECREATION CENTER"
# How many Parks are in Noe Valley?
length(parks[parks$nhood=='Noe Valley',]$ParkName)
## [1] 5

Warning

Data misalignment issues are common!

Aggregating Data Spatially

Question: How many parks are in each Neighborhood?

To answer this we need to

  • perform a Point-in-Polygon query like we did with over

  • count the number of parks within each neighborhood

We can use the sp::aggregate function for this.

Note, sp::aggregate is a spatial extention of stats::aggregate.

sp::aggregate

Make sure sp is loaded or the aspatial aggregate function will be used.

Try both of the following and compare the output:

See ?sp::aggregate for details

nhood_park_count0.df <- aggregate(x = parks, by = sf_nhoods, 
                                  FUN = length)

nhood_park_count.df <- aggregate(x = parks["ParkID"], by = sf_nhoods, 
                                 FUN = length)

Saving the results

Add the output of the aggregate command to the SF Neighborhoods layer.

# Some checks first
#head(nhood_park_count.df@data)
#nrow(sf_nhoods)
#nrow(nhood_park_count.df)

sf_nhoods$park_count <- nhood_park_count.df$ParkID

head(sf_nhoods@data)
##                            nhood park_count
## 0          Bayview Hunters Point          9
## 1                 Bernal Heights          3
## 2            Castro/Upper Market          5
## 3                      Chinatown          2
## 4                      Excelsior          2
## 5 Financial District/South Beach          2

Map it with tmap

Create a choropleth map of the SF Neighborhoods by the count of parks.

Challenge: add popup vars so we see count and nhood name on click

  • What neighborhood has the most parks?
tm_shape(sf_nhoods) + tm_polygons(col="park_count", palette="Reds", 
                                  popup.vars=c("nhood","park_count")) 

What Neighborhood has the most park acreage?

Use the aggregate command to sum the acres in parks by nhood

  • rather than the sum like we did before.

How do we need to change the previous function to do this?

.

Changing the aggregate function

# compute the park acreage within each neighborhood
nhood_park_acres.df <- aggregate(x = parks["Acreage"], 
                                 by = sf_nhoods, FUN = sum)

# Add the acreage sums back to the SF Neighborhoods layer
sf_nhoods$park_acreage <- nhood_park_acres.df$Acreage

Map it

Then make a choropleth map of neighborhoods by park acres

tm_shape(sf_nhoods) + tm_polygons(col="park_acreage", palette="Reds", 
                        popup.vars=c("nhood","park_count", "park_acreage"))

Computing Area

Now that we know the park acreage in each neighborhood, let's compute the proportion of area in each neighborhood that is a park.

To do this, we first need to compute the acreage of each neighborhood.

We can do this with the rgeos function gArea

?gArea

rgeos

The rgeos library is widely used for spatial operations.

Most rgeos commands begin with the letter g for ….

In order to compute spatial measurements, the input spatial data must have a projected CRS.

  • This is not True for spatial relationship queries, like PIP, however.

spTransform

Let's transform or layers to a projected CRS that is good for San Francisco.

We will use UTM Zone 10, NAD83 CRS which has the code epsg:26910.

Note, the units for UTM CRSs are meters

Projected CRS

Convert layers to UTM Zone 10, NAD83

parks_utm10 <- spTransform(parks, CRS("+init=epsg:26910"))
sfhoods_utm10 <- spTransform(sf_nhoods, CRS("+init=epsg:26910"))

Take a look at the @bbox slot for these layers to see that the coords are no longer lat/lon.

Calculate Area

Try the following and review the output.

What is the output?

What are the units?

gArea(sfhoods_utm10)

gArea(sfhoods_utm10, byid=TRUE)

gArea

# Add the output from gArea to sfhoods_utm10 as `nhood_area_m2`
sfhoods_utm10$nhood_area_m2 <- gArea(sfhoods_utm10, byid=T)

# Convert to acreage (1 sqmeter = 0.000247105 acres) as `nhood_acreage`
sfhoods_utm10$nhood_acreage <- sfhoods_utm10$nhood_area_m2 * 0.000247105

# Create a new column that is proportion of park_acreage / nhood_acreage 
sfhoods_utm10$prop_park_acres <- sfhoods_utm10$park_acreage / sfhoods_utm10$nhood_acreage

Map it

Create a tmap choropleth map of neighborhoods by proportion of park area.

tm_shape(sfhoods_utm10) + tm_polygons(col="park_acreage", palette="Reds",
                                popup.vars=c("nhood","park_count", "park_acreage", 
                                             "nhood_acreage", "prop_park_acres"))

Questions?

We the People

No discussion of parks and neighborhoods is complete without bringing in population.

Let's add SF Census tracts with population data to our analysis.

We will use these to compute the number of people in each neighborhood.

SF Census Tracts

Use readOGR to load the shapefile sf_pop_by_tracts

Then take a look at the data.

  • What type of sp object is it?
  • What is its CRS?
  • What column contains the population data?

SF Census Tracts

sftracts <- readOGR(dsn="./data", layer="sf_pop_by_tracts")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "sf_pop_by_tracts"
## with 196 features
## It has 10 fields
head(sftracts@data)
##   STATEFP COUNTYFP TRACTCE             AFFGEOID       GEOID   NAME LSAD
## 0      06      075  010700 1400000US06075010700 06075010700    107   CT
## 1      06      075  012201 1400000US06075012201 06075012201 122.01   CT
## 2      06      075  013102 1400000US06075013102 06075013102 131.02   CT
## 3      06      075  013500 1400000US06075013500 06075013500    135   CT
## 4      06      075  015500 1400000US06075015500 06075015500    155   CT
## 5      06      075  016300 1400000US06075016300 06075016300    163   CT
##    ALAND AWATER pop14
## 0 183170      0  5311
## 1  92048      0  4576
## 2 237886      0  2692
## 3 235184      0  2592
## 4 311339      0  3918
## 5 245867      0  4748
proj4string(sftracts)
## [1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"

Transform the CRS to UTM10

The SF Tracts data are in geographic coordinates (what CRS?).

Use spTransform to create a new layer in the UTM10 NAD83 CRS.

sftracts_utm10 <- spTransform(sftracts, CRS("+init=epsg:26910"))

head(sftracts_utm10)
## class       : SpatialPolygonsDataFrame 
## features    : 6 
## extent      : 549029.7, 552338.1, 4180865, 4184030  (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:26910 +proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0 
## variables   : 10
## names       : STATEFP, COUNTYFP, TRACTCE,             AFFGEOID,       GEOID, NAME, LSAD,  ALAND, AWATER, pop14 
## min values  :      06,      075,  010700, 1400000US06075010700, 06075010700,  107,   CT,  92048,      0,  2592 
## max values  :      06,      075,  016300, 1400000US06075016300, 06075016300,  163,   CT, 311339,      0,  5311

Map the Tracts

Map a tmap of the tracts and the neighborhood layers.

tm_shape(sftracts_utm10) + tm_polygons(border.col="blue", 
                                       col="pop14", palette="Reds") + 
  tm_shape(sfhoods_utm10) + tm_polygons(alpha=0, border.col="yellow", lwd=2)

Transfering Data

What is the strategy for transferring data from tracts to neighborhoods?

Fortunately, the SF Census Tracts are nested within the neighborhoods.

Why is that good for computing population for each neighborhood?

Alignment issues

Unfortunately, the data are not aligned.

This makes polygon-polygon overlay even more challenging.

Polygons to Points

Since all the tracts are within a neighborhood, the centerpoint or centroid of the tract should also be.

We can reduce this to a point-in-polygon overlay by transforming the sftracts_utm10 layer to a point layer.

We can do this as follows…

# Convert the SpatialPolygonsDataFrame to a SpatialPointsDataFrame
sftract_ctrs <- SpatialPointsDataFrame(sftracts_utm10, 
                                       data=sftracts_utm10@data)

# define the CRS
proj4string(sftract_ctrs) <- CRS(proj4string(sftracts_utm10))

Map it

tm_shape(sftracts_utm10) + tm_polygons(border.col="blue", col="pop14",
                                       palette="Reds") +
  tm_shape(sfhoods_utm10) + tm_polygons(alpha=0, border.col="yellow", lwd=2) +
  tm_shape(sftract_ctrs) + tm_symbols(col="black", size=.5)

Overlay

Compute the neighborhood in which each SF census tract centroid falls.

# over
nhood_and_tract_over_output <- over(sftract_ctrs,sfhoods_utm10)

# We only want the nhood column
tract_nhood_only <-nhood_and_tract_over_output[c("nhood")]

# check it
nrow(tract_nhood_only) == nrow(sftract_ctrs)
## [1] TRUE
# Add the neighborhood for each tract vector to the sftracts SPDF
sftracts_utm10@data <- cbind(sftracts_utm10@data,tract_nhood_only)

Aggregate population by Neighborhood

Use aggregate to sum the pop14 values for each nhood

Try to figure it out..

aggregate

Sum pop14 by neighborhood

pop_by_hood <- aggregate(pop14 ~ nhood, sftracts_utm10, sum)

# Take a look
#View(pop_by_hood)

Check it

Is the sum of of pop14 for all neighborhoods equal to that for all tracts?

#global sum
sum(pop_by_hood$pop14)
## [1] 825971
#is it the same as the sum of pop14 in the sftracts_utm10 layer?
sum(sftracts_utm10$pop14)
## [1] 829072

Add Pop14 to the SF Neighborhoods

and compute a density measure of people/park acres

# Add the acreage sums back to the SF Neighborhoods layer
sfhoods_utm10$pop14 <- pop_by_hood$pop14

# compute the pop density
sfhoods_utm10$people_per_park <- sfhoods_utm10$pop14/ 
                          sfhoods_utm10$park_acreage

Map it

Make an SF neighborhoods choropleth map of people per park.

  • add the popup.vars to show park name, population, number of parks etc.

Is there an obvious neighborhood to consider for locating a new park?

tm_shape(sfhoods_utm10) + tm_polygons(col="people_per_park", 
          popup.vars=c("nhood","pop14","people_per_park",
                       "park_acreage","park_count")) 

Save our output to new files

When you do a lot of processing its a good idea to save your work.

writeOGR(sfhoods_utm10, ".","sf_park_nhoods", driver="ESRI Shapefile")

Proximity

Distance queries

Are any Parks within 1 km of Coit Tower?

Where is Coit Tower? Let's ask Google by using the ggmap package.

library(ggmap)
library(ggplot2)
coit_tower <- geocode('Coit Tower, San Francisco, CA')
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Coit%20Tower,%20San%20Francisco,%20CA&sensor=false
coit_tower_pt <- c(coit_tower$lon, coit_tower$lat) 
coit_tower_pt
## [1] -122.40582   37.80239

What Park is Closet

to Coit Tower?

We can use the spDistsN1 function to compute this.

dist_matrix <- spDistsN1(parks,coit_tower_pt, longlat=T)
parks$coit_distkm <- dist_matrix

#View(parks@data)

What Parks are within 1 KM of

of Coit Tower?

We can use the rgeos::gBuffer function for this.

Can you understand this code?

coordinates(coit_tower) <- c("lon","lat")
proj4string(coit_tower)<- CRS("+init=epsg:4326")

coit_tower_utm <- spTransform(coit_tower, CRS("+init=epsg:26910"))
coit_km_buffer <- gBuffer(coit_tower_utm, width=1000)

#map check
#qtm(coit_tower_utm)
#qtm(coit_km_buffer)

Challenge

Now that we have our buffer polygon, what parks are in it?

Use over to find out!

near_coit <- over(coit_km_buffer,parks_utm10, returnList = T)

# Take a look at output
head(near_coit)
## $buffer
##                    ParkName                        ParkType Acreage ParkID
## 61  JOE DIMAGGIO PLAYGROUND Neighborhood Park or Playground    2.52    150
## 76  MICHELANGELO PLAYGROUND Neighborhood Park or Playground    0.44    148
## 91        PORTSMOUTH SQUARE Neighborhood Park or Playground    1.29    141
## 120       WOH HEI YUEN PARK Neighborhood Park or Playground    0.31    170
##                                                            location
## 61  651 Lombard St\nSan Francisco, CA\n(37.80248342, -122.41206911)
## 76       Greenwich\nSan Francisco, CA\n(37.80109573, -122.41689916)
## 91      Washington\nSan Francisco, CA\n(37.79483492, -122.40535067)
## 120 922 Jackson St\nSan Francisco, CA\n(37.79584858, -122.41033649)
##            nhood
## 61   North Beach
## 76  Russian Hill
## 91     Chinatown
## 120    Chinatown

Questions?

Summary

That was a whirlwind tour of just some of the methods of spatial analysis.

We just barely mentioned geocoding.

There was a lot we didn't and can't cover.

Raster data is a another major topic!

Spring semester classes

  • Geog 88
  • Geog 187
  • LA221
  • CyPlan 204C
  • LA289

The future is sf

Selected References

Output code to script

library(knitr)
purl("r-geospatial-workshop-f2017-pt2.Rmd", output = "scripts/r-geospatial-workshop-f2017-pt2.r", documentation = 1)